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Abstract 

This note provides very simple, efficient algorithms for computing the number of distinct longest 
common subsequences of two input strings and for computing the number of LCS embeddings. 
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1 Background and Terminologies 

Let A = a\(i2 ■ ■ ■ a m and B — b\b2 ■ ■ ■ b n (m < n) be two sequences over an alphabet E. A sequence that 
can be obtained by deleting some symbols of another sequence is referred to as a subsequence of the original 
sequence. A common subsequence of A and B is a subsequence of both A and B. A longest common 
subsequence (LCS) is a common subsequence of greatest possible length. A pair of sequences may have 
many different LCSs. In addition, a single LCS may have many different embeddings, i.e., positions in the 
two strings to which the characters of the LCS correspond. 

Most investigations of the LCS problem have focused on efficiently finding one LCS. A widely familiar 
0{mn) dynamic programming approach goes back at least as far as the early 1970s || [7| and many later 
studies have focused on improving the time and/or space required for the computation. Methods have also 
been developed to efficiently generate a listing of all distinct LCSs or all LCS embeddings in time proportional 
to the output size (plus a preprocessing time of 0{mn) or less) [j], Here we show that the simplest 

scheme || can be simplified even further if we seek only a count of the number of distinct LCSs (or of the 
number of LCS embeddings). We obtain a running time of 0(mn) and a space bound of 0(m). (While the 
number of LCSs (or LCS embeddings) can grow very large as input size increases 0, the results here are 
based on the standard assumption of unit time for any arithmetic operation without worrying about the 
possible magnitude of the operands.) 



2 Computing the Number of LCSs or LCS embeddings 

The familiar 0(mn) method for computing the length of an LCS is a "bottom-up" dynamic programming 
approach based on the following recurrence for the length L[i, j] of an LCS of a\a,2 . . . and &1&2 ■ ■ • bf 

if i = or j = 

L[i,j] = { L[i - l,j - 1] + 1 if i,j > and a» = bj (1) 

max{L[i — l,j],L[i,j — 1]} otherwise 
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We can use a similar approach to devise an 0(mn) algorithm to compute the number of distinct LCSs 
D[m, n] of a±a2 ■ ■ ■ a m and &1&2 ■ ■ ■ b n : 

1 for j ; «— to n do 

2 for i <— to m do 

3 if i = or j = then D[i, j] 1 

4 else 

5 D[i,j]^0 

6 if a; = bj then £>[i, j] <— D[i — l,j — 1] 

7 else 

8 if L[« - = L[i,j] then D[i,j] <- D[£,i] + £>[i - 1, j] endif 

9 if j - 1] = L[i,j] then D[i, j] <- D[i,j] + D[i, j - 1] endif 

10 if L[i — 1, j — 1] = j] then D[i, j] <- D[i,j] - D[i -l,j- 1] endif 

11 endif 

12 endif 

13 endfor 

14 endfor 

(Note that there is always at least one LCS, since the empty string e is always considered to be a common 
subsequence of the input sequences.) 

In the pseudocode above, line || could have been moved inside the else clause beginning at line |7[ but 
the pseudocode as written is particularly easy to modify for computation of the number of LCS embeddings 
rather than the number of distinct LCSs; just replace that else with the endif from line |ll|. (The test 111 
line |l0| is never satisfied with <2j = bj, but it is harmless and concise to write the code this way.) 

We may also note that 0(m) space suffices for the computation, since we really only need a portion of 
two columns of the L and D arrays at any time. Here is a rewrite of the code to achieve 0{m) space that 
also introduces the necessary change to switch from computing the number of distinct LCSs to computing 
the number of LCS embeddings _E[m]; we also include the efficient computation of the L values: 



1 for j <— to n do 

2 for i «— to m do 

3 if i = or j = then L[i] <- 0, oldL <- 0, E[i] <- 1, and oldE <- 1 

4 else 

5 newL <— max{L[i — 1], L[i]} and newE <— 

6 if dj = bj then newL <— oldL + 1 and newE <— oldE endif 

7 if L[i — 1] = newL then newE newE+ E[i — 1] endif 

8 if L[i] = newL then newE <— newE+ E[i] endif 

9 if oldL — newL then newE <— newE — oldE endif 

10 oldL <— L[i], oldE <— E[i], L[i] <— newL, and E[i] <— newE 

11 endif 

12 endfor 

13 endfor 
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